GH-15483: [C++] Add a Fixed Shape Tensor canonical ExtensionType#8510
GH-15483: [C++] Add a Fixed Shape Tensor canonical ExtensionType#8510jorisvandenbossche merged 39 commits intoapache:mainfrom
Conversation
|
Currently, only the shape is stored. Is this enough? That does a assume a fixed row major order? |
I think we either assume that or also store strides / dimension order. I am not sure how dimension order changes are done in other frameworks (TF, pytorch, etc.) but I would assume they don't reorder tensors in memory. So I would go for storing strides. |
d4608a9 to
356c300
Compare
b5a8643 to
a5b19d7
Compare
|
In the context of testing metadata equality withinin multiple parquet files in a dataset, equality on shape and strides may be a very strict requirement. Would relaxing the equality requirement to only compare the number of tensor dimensions negatively impact the design? |
Good point. By tensor dimensions you mean shape, right? |
I was thinking even looser: def __eq__(self, other):
len(self.shape) == len(other.shape) |
Done. |
|
@jorisvandenbossche @sjperkins @pitrou is there interest to get this in? |
|
Currently we don't ship any standard extension types. I recommend discussing this on the mailing-list. |
|
fyi, the ray project created its own Tensor type: |
|
Indeed I think having a built-in Tensor value type (implemented using extension arrays) in Arrow/pyarrow would be better than having third party projects rolling their own. |
|
@wesm would there be interest in folding the Pandas side of these third-party extensions into Pandas also? |
That will be something to discuss in the pandas project. |
|
Thanks for the fast review @jorisvandenbossche. I've addressed your points. |
|
@jorisvandenbossche it seems CI is satisfied. |
|
@rok would it be possible to add a method for |
AlenkaF
left a comment
There was a problem hiding this comment.
I have connected it to Python and it works well 👍
Would be great to have a method for value_type though, as mentioned in a previous comment.
|
@AlenkaF sure, I'll add it. |
|
@AlenkaF done. |
|
@rok, you are awesome! 👍 |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
The failing CI is unrelated? (it seems the R failures are being worked on, and the C++ failures are related to LLVM update #34768)
|
Great news! |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
They seem unrelated indeed and I don't think they obscure any new problems as the change was fairly minimal. |
|
Merged after 2.5 years ;) Thanks @rok! |
|
Thanks for all the input and reviews everyone, very happy to see this merged! @jorisvandenbossche now let's talk about strides @ #34797 :D |
|
Benchmark runs are scheduled for baseline = 81c828e and contender = a84a39b. a84a39b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
We started a mailing list discussion about potential |
Uh oh!
There was an error while loading. Please reload this page.